A Factored Language Model for Prosody Dependent Speech Recognition
نویسندگان
چکیده
Prosody refers to the suprasegmental features of natural speech (such as rhythm and intonation) that are used to convey linguistic and paralinguistic information (such as emphasis, intention, attitude, and emotion). Humans listening to natural prosody, as opposed to monotone or foreign prosody, are able to understand the content with lower cognitive load and higher accuracy (Hahn, 1999). In automatic speech understanding systems, prosody has been previously used to disambiguate syntactically distinct sentences with identical phoneme strings (Price et al., 1991), infer punctuation of a recognized text (Kim & Woodland, 2001), segment speech into sentences and topics (Shriberg et al., 2000), recognize the dialog act labels (Taylor et al., 1997), and detect speech disfluencies (Nakatani and Hirschberg, 1994). None of these applications use prosody for the purpose of improving word recognition (i.e., the word recognition module in these applications does not utilize any prosody information). Chen et al. (Chen et al., 2003) proposed a prosody dependent speech recognizer that uses prosody for the purpose of improving word recognition accuracy. In their approach, the task of speech recognition is to find the sequence of word labels W = (w1,K,wM ) that maximizes the recognition probability:
منابع مشابه
Using Prosodic Features in Language Models for Meetings
Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representat...
متن کاملImproving the Robustness of Prosody Dependent Language Modeling Based on Prosody Syntax Dependence
This paper presents a novel approach that improves the robustness of prosody dependent language modeling by leveraging the dependence between prosody and syntax. A prosody dependent language model describes the joint probability distribution of concurrent word and prosody sequences and can be used to provide prior language constraints in a prosody dependent speech recognizer. Robust Maximum Lik...
متن کاملProsody Dependent Speech Recognition on Radio News
Does prosody help word recognition? Humans listening to natural prosody, as opposed to monotone or foreign prosody, are able to understand the content with lower cognitive load and higher accuracy [1]. For automatic Large Vocabulary Continuous Speech Recognition (LVCSR), the answer is not that straightforward. Even though successful word recognition and successful prosody recognition have been ...
متن کاملFactored translation models for enriching spoken language translation with prosody
Key contextual information such as word prominence, emphasis, and contrast is typically ignored in speech-to-speech (S2S) translation due to the compartmentalized nature of the translation process. Conventional S2S systems rely on extracting prosody dependent cues from hypothesized (possibly erroneous) translation output using only words and syntax. In contrast, we propose the use of factored t...
متن کاملProsody dependent speech recognition with explicit duration modelling at intonational phrase boundaries
Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. The prosody attribute that we investigate in this study is the duration lengthening effects of the speech segments in the vicinity of intonational phrase boundaries. Explicit Duration Hidden Markov Model (EDHMM)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007